Acquiring food items with a fork poses an immense challenge to a robot-assisted feeding system, due to the wide range of material properties and visual appearances present across food groups. Deformable foods necessitate different skewering strategies than firm ones, but inferring such characteristics for several previously unseen items on a plate remains nontrivial. Our key insight is to leverage visual and haptic observations during interaction with an item to rapidly and reactively plan skewering motions. We learn a generalizable, multimodal representation for a food item from raw sensory inputs which informs the optimal skewering strategy. Given this representation, we propose a zero-shot framework to sense visuo-haptic properties of a previously unseen item and reactively skewer it, all within a single interaction. Real-robot experiments with foods of varying levels of visual and textural diversity demonstrate that our multimodal policy outperforms baselines which do not exploit both visual and haptic cues or do not reactively plan. Across 6 plates of different food items, our proposed framework achieves 71% success over 69 skewering attempts total. Supplementary material, datasets, code, and videos are available on our website: https://sites.google.com/view/hapticvisualnet-corl22/home
translated by 谷歌翻译
可变形的物体操纵仍然是机器人研究中的具有挑战性的任务。用于参数推断和状态估计的传统技术通常依赖于状态空间的精确定义及其动态。虽然这适用于刚性物体和机器人状态,但定义可变形物体的状态空间并如何及时演变。在这项工作中,我们构成了作为用模拟器定义的概率推断任务的可变形对象的物理参数的问题。我们提出了一种用于通过技术从图像序列提取状态信息的新方法,以将可变形对象作为分布嵌入的状态提取。这允许以原则的方式将噪声状态观察直接进入基于现代贝叶斯模拟的推理工具。我们的实验证实,我们可以估计物理性质的后部分布,例如高可变形物体的弹性,摩擦和尺度,例如布和绳索。总的来说,我们的方法解决了概率的实际问题,并有助于更好地代表可变形对象状态的演变。
translated by 谷歌翻译
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.
translated by 谷歌翻译
6D object pose estimation has been a research topic in the field of computer vision and robotics. Many modern world applications like robot grasping, manipulation, autonomous navigation etc, require the correct pose of objects present in a scene to perform their specific task. It becomes even harder when the objects are placed in a cluttered scene and the level of occlusion is high. Prior works have tried to overcome this problem but could not achieve accuracy that can be considered reliable in real-world applications. In this paper, we present an architecture that, unlike prior work, is context-aware. It utilizes the context information available to us about the objects. Our proposed architecture treats the objects separately according to their types i.e; symmetric and non-symmetric. A deeper estimator and refiner network pair is used for non-symmetric objects as compared to symmetric due to their intrinsic differences. Our experiments show an enhancement in the accuracy of about 3.2% over the LineMOD dataset, which is considered a benchmark for pose estimation in the occluded and cluttered scenes, against the prior state-of-the-art DenseFusion. Our results also show that the inference time we got is sufficient for real-time usage.
translated by 谷歌翻译
Machine learning (ML) is revolutionizing protein structural analysis, including an important subproblem of predicting protein residue contact maps, i.e., which amino-acid residues are in close spatial proximity given the amino-acid sequence of a protein. Despite recent progresses in ML-based protein contact prediction, predicting contacts with a wide range of distances (commonly classified into short-, medium- and long-range contacts) remains a challenge. Here, we propose a multiscale graph neural network (GNN) based approach taking a cue from multiscale physics simulations, in which a standard pipeline involving a recurrent neural network (RNN) is augmented with three GNNs to refine predictive capability for short-, medium- and long-range residue contacts, respectively. Test results on the ProteinNet dataset show improved accuracy for contacts of all ranges using the proposed multiscale RNN+GNN approach over the conventional approach, including the most challenging case of long-range contact prediction.
translated by 谷歌翻译
大型变压器模型实现了自然语言理解任务的最新状态,并越来越成为建模源代码的基线模型体系结构。通常,变压器在大型无监督的语料库中进行预训练,学习令牌表示和与通常可用的文本相关的转换,然后对特定的下游感兴趣的任务进行微调。虽然微调是一种尝试将模型调整为新领域的久经考验的方法(例如,在给定主题上提出问题,概括仍然是一个持续的挑战。在本文中,我们探索并评估了变形金刚的模型以进行个性化。在为Java方法生成单元测试的背景下,我们评估学习以使用多种个性化技术为特定的软件项目个性化。我们考虑三种关键方法:(i)自定义微调,这允许调整所有模型参数; (ii)轻巧的微调,它冻结了大多数模型的参数,可以单独调整令牌嵌入和SoftMax层或单独的最终层; (iii)前缀调整,该调谐使模型参数冻结,但优化了小型项目特定的前缀矢量。这些技术中的每一个都提供了总计算成本和预测性能的权衡,我们通过代码和特定任务指标,培训时间和总计算操作进行评估。我们比较了这些微调策略以生成代码,并讨论了各种部署方案中每个策略的潜在概括和成本益处。
translated by 谷歌翻译
离群值检测是一项具有挑战性的活动。文献中提出了几种机器学习技术,以进行异常检测。在本文中,我们为双向gan(Bigan)提出了一种新的培训方法,以检测异常值。为了验证拟议的方法,我们采用拟议的培训方法来培训一个Bigan,以检测正在操纵其纳税申报表的纳税人。对于每个纳税人,我们从他/她提交的纳税申报表中得出六个相关参数和三个比率参数。我们在这九个派生的地面数据集上采用拟议的培训方法来训练Bigan。接下来,我们使用$ encoder $(使用$ encoder $编码此数据集)生成此数据集的潜在表示,并使用$ Generator $(使用$ Generator $解码)再生此数据集,通过提供此潜在表示为输入。对于每个纳税人,计算其基地数据和再生数据之间的余弦相似性。具有较低余弦相似性措施的纳税人是潜在的回程操纵者。我们应用了我们的方法来分析印度特兰加纳政府商业税务部提供的钢铁纳税人数据集。
translated by 谷歌翻译
循环贸易是商品和服务税的逃税形式,其中一组欺诈性纳税人(交易者)的目标是通过在短期内将几项虚拟交易(在商品或服务中添加价值不高)来掩盖非法交易,以掩盖非法交易。。由于纳税人的庞大数据库,当局可以手动识别循环交易者和他们所涉及的非法交易的群体是不可行的。这项工作使用大数据分析和图形表示技术来提出一个框架来识别循环交易者社区并隔离各个社区的非法交易。我们的方法经过印度特兰加纳政府商业税部提供的现实生活数据,在那里我们发现了几个循环商人社区。
translated by 谷歌翻译
预先训练的大语言模型(LLM)(例如OpenAI Codex)通过从非正式自然语言(NL)意图中生成自然代码来自动化编码的重要方面。但是,生成的代码无法满足用户意图的任何正确性保证。实际上,很难定义正确性的概念,因为自然语言可能是模棱两可的,并且缺乏正式的语义。在本文中,我们通过提出测试驱动的用户形式化(TDUIF)的工作流程来解决以上问题的第一步,该工作流利用轻量级用户的反馈共同将用户的意图正式化为测试(部分规范) ),(b)生成符合正式用户意图的代码。要对算法进行可扩展的大规模自动化评估,而无需循环中的用户,我们描述了如何使用参考解决方案模拟用户与高保真性的互动。我们还描述并实施了几种算法组件(包括突变和排名一组测试)的替代实现,这些实现可用于有效解决TDUIF问题。我们已经开发了一个系统的Ticoder,该系统实现了多种解决方案来进行TDUIF,并将其对MBPP学术代码生成基准测试的相对有效性进行了比较。在MBPP上使用OpenAI Codex LLM的结果很有希望:我们的最佳算法将通行证@1代码生成准确度指标从48.39%提高到单个用户查询,最高为85.48%,最多可达55.48%,最多可提供5个用户查询。其次,我们可以生成与用户意图在1.69个用户查询中的非平凡功能单位测试,该数据集为90.40%的示例,用于此数据集。
translated by 谷歌翻译
解剖跟踪数据提供了有关脑电路的详细信息,这些信息对于解决扩散MRI拖拉术中的某些常见误差必不可少。然而,由于截断,噪声和伪影的存在以及强度/对比度变化,因此在跟踪数据上对纤维束的自动检测具有挑战性。在这项工作中,我们提出了一种具有自律损失函数的深度学习方法,该方法将基于解剖的损失函数构成了基于解剖学的约束,以准确地分割了猕猴大脑的示踪剂切片上的纤维束。同样,鉴于手动标签的可用性有限,我们使用半监督的培训技术有效地使用未标记的数据来改善性能和位置限制,以进一步降低误报。对不同猕猴的看不见的方法的评估,产生了令人鼓舞的结果,真正的正速率约为0.90。我们方法的代码可从https://github.com/v-sundaresan/fiberbundle_seg_tracing获得。
translated by 谷歌翻译